专利摘要:
The present invention relates to a computer-implemented method for predicting a time series based on one or more indicator time series; wherein said time series relates to an observed period, wherein the time series comprises a plurality of physical entity input values associated with a plurality of times occurring during said observed period; wherein said one or more indicator time series each relates to an associated indicator period, each indicator period at least partially overlapping with said observed period, each indicator time series comprising a plurality of indicator input values of the physical entity associated with a plurality of indicator times that occur during the associated indicator period; wherein the method comprises a plurality of steps (a) to (f).
公开号:BE1026185B1
申请号:E2018/5518
申请日:2018-07-17
公开日:2019-11-04
发明作者:Gylian Verstraete;David Vulsteke
申请人:Solventure Life Nv;
IPC主号:
专利说明:

IMPROVED FORECAST METHOD
Technical area
The invention relates to the technical field of data processing of records of a physical entity with a view to predictions.
Background
The increasing availability of digital data offers both opportunities and challenges. The possibility of using this data for increasing the effectiveness and efficiency of actions and decision making is enormous. Using this data with effective tools can transform decision-making from reactive to proactive and predictive. However, the volume, variety, and speed of this data can actually reduce the effectiveness of analysts and decision makers by creating cognitive overload and analysis by analysis, particularly in environments where decisions need to be made quickly.
It has become crucial for predictions and analyzes to select statically meaningful data sources that go beyond the actual data that has been predicted. Therefore, there is an unmet need for prediction methods and systems that effectively speculate on external data sources.
US 2017/0011299 describes a visual analysis system and method that provides a proactive and predictive environment to help decision makers effectively make resource allocation and resource decisions. The challenges associated with such predictive analytical processes include understanding end-users, and applying the underlying statistical algorithms to the appropriate spatio-temporal granularity levels so that good prediction estimates can be determined. The approach described provides a set of templates and methods on a natural scale that allow users to focus on and address appropriate geospatial and temporal decision levels. The prediction technique described is based on the Seasonal Trend decomposition based on Loess (STL) method that is applied in a spatio-temporal visual analytical context to provide analysts with predicted levels of future activity. A new core density estimation technique is also becoming
BE2018 / 5518, where the prediction process is influenced by the spatial correlation of recent incidents at nearby locations.
US 2016/0323158 describes a method that includes obtaining data from time series. The data of time series include a plurality of network usage measurements. The plurality of network usage measurements is indicative of a plurality of uses of one or more sources of a network resource at a plurality of times. The method also includes determining whether the data of time series comprises a plurality of segments. Each segment of the plurality of segments is associated with a separate regression model and each segment comprises part of the data from the time series. The method further comprises identifying a current segment from the data of the time series when the data from the time series comprises the plurality of segments. The method further comprises determining an estimated network usage based on a current regression model associated with the current segment.
US 2017/0011299 and US 2016/0323158 provide no means for effectively speculating on external data sources in predictions, and do not describe how data from external sources should be taken into account without ending with an overly complex system.
The present invention has for its object to solve at least some of the above problems.
Summary of the invention
The present invention provides in a first aspect a computer-implemented method according to claim 1. In addition, the term for each indicator time series is to be interpreted as a reference to each of the different data sources associated with the respective indicator time series.
Such a method advantageously comprises the knowledge of one or more indicator time series in the prediction of a specific time series. Prior art methods are known to be based on dissolution of the time series into a trend component and a seasonal component, but these methods are primarily aimed at identifying the trend component in a first step, calculating a predicted trend based on on this trend component, and then adding the
BE2018 / 5518 seasonal component in the final step and obtaining a predicted output value. However, such methods do not incorporate the knowledge of one or more indicator time series. On the other hand, there are prior art methods based on regression for recording one or more indicator time series, but these methods have no means to account for the variation present in the time series comprising a seasonal component. Accordingly, the present invention advantageously describes how dissolution in trend and seasonal component is to be combined with regression. Furthermore, prior art methods that allow recording of information from one or more indicator time series often have no means for selective recording. This is vital to prevent the problem of overfitting being mentioned, where the use of too many and / or insufficiently relevant indicator time series can lead to a limited view of the time series being observed, leading to poor generalization capacity and general poor predictive performance. By selecting the most relevant indicator time series and basing the forecast only on the said indicator time series, this problem is prevented or at least reduced. In general, this leads to superior performance of the present invention compared to traditional approaches such as Holt-Winters, ARIMA or ETS. A discussion of ARIMA, ETS and related traditional approaches is given in (R. J. Hyndman, G. Athanasopoulos, Forecasting: principles and practice, ISBN 978-0-9875071-0-5, otexts.com, 2014). This superior performance therefore comes from the advantageous combination of trend and seasonal component dissolution with regression, while providing appropriate means to prevent overfitting from the original time series.
In a second aspect, the present invention provides a system according to claim 15.
In a third aspect, the present invention provides a use according to claim 17.
The advantages of the system and the use are similar to those offered by the method according to the present invention. Preferred embodiments and their advantages are discussed in the detailed description and the dependent claims.
BE2018 / 5518
Description of the figures
Figure 1 shows an example of the architecture of the system according to the present invention.
Figure 2 shows a detailed view of the example of the architecture of Figure 1.
Figure 3 shows an example of search architecture.
Figure 4 shows an example of a data model with regard to the present invention.
Figure 5 shows an example of a flow diagram of the method according to the present invention.
Figure 6 shows a first graph with respect to numerical examples illustrating the present invention.
Figure 7 shows a second graph with respect to numerical examples illustrating the present invention.
Figure 8 shows a third graph with respect to numerical examples illustrating the present invention.
Figure 9 shows a fourth graph with respect to numerical examples illustrating the present invention.
Detailed description of the invention
In this document, the term data source refers to a database that is available for user access. If the database is stored on a computer infrastructure that is managed by the user and / or is managed by the user through a third-party cloud service, the data source may refer to the database that is present on the said computer infrastructure and the access that is is requested by the user, e.g. locally via a LAN network and / or the intranet or remotely, via the internet or an equivalent communication technology, preferably via a cloud-based interface. If the database is not managed by the user, the data source may, for example, relate to a data stream available to the user over the internet or an equivalent communication technology, preferably via an application programming interface (API) or similar.
BE2018 / 5518
The terms time shift and lag are used interchangeably in this document. The time shift as used in the invention may relate to a lag compensation that may be inherent in certain indicator time series. In one example, a first indicator time series (and, equivalent in this context, a first indicator trend component) is associated with a particular unknown lag A while a second indicator time series may be associated with a particular lag B that is also unknown. The invention can advantageously compensate by shifting the indicator time series over time as required. In the example, this may relate, for example, to the time-shifting of the first indicator time series for an duration (approximately) A, and the time-shifting of the second indicator time series for a duration (approximately) B. With a view to however, the usually coarse granularity of time input values, with for example only one value per month, it may happen that the actual lag is not a full number of months, but rather an intermediate value of N months and M days, where N and M are integers. In such a case, the embodiment advantageously allows the first indicator time series to be used twice, for example by considering the time-shifted indicator time series both for lag N and for lag N + 1. This is illustrated e.g. on examples 4-5.
STL belongs to the broader class of seasonal termination procedures in the sense discussed in (J. B. Carlin and A. P. Dempster, Sensitivity Analysis or Seasonal Adjustments: Empirical Case Studies. Journal of the American Statistical
Association 84 (405): 6-20, 1989) which is hereby incorporated by reference.
STL is a filtering procedure proposed in (B. Cleveland et al., A seasonal trend decomposition procedure based on loess, Journal of Official Statistics 6 (1): 3-33, Statistics Sweden, 1990) for decomposing a time series in trend -, seasonal - and remaining components. STL has a simple design that consists of a sequence of applications from the loess smoother; the simplicity allows analysis of the properties of the procedure and allows rapid calculation, even for very long time series and large amounts of trend and seasonal smoothing. Other features of STL are specification of quantities of seasonal and trend-related smoothing that, in an almost continuous manner, range from a very small amount of smoothing to a very large amount; robust estimates of the trend and seasonal components that are not distorted by abnormal behavior in the data; specification of the period of the seasonal component to any whole number of it
BE2018 / 5518 time sampling interval greater than one; and the ability to dissolve time series with missing value.
STL is an acronym for Seasonal and Trend decomposition using Loess, while loess is a method for estimating non-linear relationships. STL is based on loess, which was originally proposed in (W. S. Cleveland, Robust Locally Weighted Regression and Smoothing Scatter Plots, Journal of the American Statistical Association 74 (368): 829-836, 1979) and further developed in, e.g. (W.S.
Cleveland et al., Locally-Weighted Regression: An Approach to Regression Analysis by Local Fitting, Journal of the American Statistical Association 83 (403): 596-610, 1988). An advantage of loess is that it does not require the specification of a function to adapt a model to all data in the sample. In addition, loess is very flexible, making it ideal for shaping complex processes for which there are no theoretical models. It is important in the use of loess that it is sensitive to the effects of outliers in the time series and / or indicator time series. The latter is advantageously addressed by various preferred embodiments of the present invention where the problem of outlier detection is explicitly addressed.
The invention offers a method, a system and a use. As will be apparent to one skilled in the art, these aspects are highly related, and a measure proposed as a preferred embodiment intended for the method can also be applied to any other aspect of the invention. In this document, all preferred embodiments therefore relate to all aspects of the invention, and any reference to a single aspect should not be interpreted as limiting the scope of the invention to only this aspect.
In a preferred embodiment, said one or more indicator time series relate to two or more indicator time series, and wherein preferably said selecting in step (e) comprises omitting at least one of said time-shifted indicator trend components from said selection .
In another preferred embodiment, said selecting in step (e) comprises comparing said regression scores and / or a normalized form of said regression scores with a predetermined threshold value and / or wherein said selecting is further based on a number of already selected time-shifted indicator trend components at the
BE2018 / 5518 determines whether an additional time-shifted indicator trend component can be selected, wherein preferably a tendency for selecting an additional time-shifted indicator trend component decreases monotonously with the said number of already selected time-shifted indicator trend components.
In yet another embodiment, said step (b) further comprises dissolving each of said indicator time series into an indicator seasonal component, said prediction in step (f) being further based on this indicator seasonal components associated with in the time shifted indicator trend components selected in step (e), preferably wherein said prediction in step (f) is further based on time shifted indicator seasonal components shifted in time from its associated indicator time period to said observed period and associated with these time-shifted indicator trend components selected in step (e).
In a preferred embodiment, said step (f) comprises predicting an indicator output value of a physical entity for at least one of said time-shifted indicator trend components selected in step (e) based on at least the said in time shifted indicator trend component and its associated seasonal component indicator and / or seasonal component shifted indicator, and wherein said prediction of said output value in step (f) is further based on said predicted output value.
In yet another preferred embodiment, said time shift of said each indicator trend component in step (c) is performed from its associated time period to a predetermined tactical period different from said observed period, and wherein said calculation of regression scores in step (d) and / or said prediction in step (f) are preferably made in view of said tactical period instead of said observed period, said tactical period preferably not being less than 1 month and at preferably no larger than 18 months.
In a preferred embodiment, said prediction in step (f) comprises the following sub-steps
BE2018 / 5518
- calculating an output trend value of the physical entity with respect to said trend component, wherein said exit trend value relates to a time moment occurring later than said observed period; wherein said calculation is based on the trend component as well as on said time-shifted indicator trend components selected in step (e);
- calculating the output value of the physical entity based on said output trend value of the physical entity and said seasonal component.
In a preferred embodiment, said dissolution is preceded in step (b) by an automated detection of one or more outliers, said one or more outliers being preferably related to additive outliers and / or level shift outliers and / or seasonal bound shift outliers. This is advantageous because it addresses a known problem in the use of loess, namely that it is sensitive to the effects of outliers in the time series and / or indicator time series.
In a preferred embodiment, said automated detection is based on an auto-regressed integrated motion average (ARIMA) model and / or wherein said automated detection comprises applying Student's t-test, preferably wherein said automated detection is based on a combination of the aforementioned autoregressively integrated motion average (ARIMA) model and the aforementioned Student's t-test.
In a preferred embodiment, said automated detection comprises an identification of a time-related event and wherein said automated detection comprises an interaction with a user for determining whether said time-related event can be associated with a known event known to the user and / or an identification with the user for determining whether an effect of said time-related event on said time series can be compensated for by changing said time series.
In a preferred embodiment, said dissolution of said time series in step (b) and / or said dissolution of each of the
BE2018 / 5518 mentioned one or more indicator time series based on Seasonal and Trend decomposition using Loess (STL).
In a preferred embodiment, said dissolution of said one or more indicator time series in step (b) is followed by an independent variable analysis, preferably main component analysis (PCA), applied to each of the indicator trend components determined in step (b) and preferably further applied to each of the seasonal trend component indicator determined in step (b), and wherein an output of said independent variable analysis is used for said time shift in step (c).
In a preferred embodiment, said dissolution of said time series in step (b) and / or said dissolution of each of said one or more indicator time series is based on a regression method preferably with respect to a model of the least squares.
In a preferred embodiment, said plurality of physical entity input values relate to quantitative measurements with respect to a physical parameter such as CO 2 concentration measurements; wherein said output value of the physical entity is related to a predicted measurement with respect to said physical parameter such as a predicted CO2 concentration measurement; and wherein said plurality of physical entity's indicator input values are related to quantitative measurements with respect to a second physical parameter and preferably third physical parameter such as local wind speed measurements or air pollution concentration measurements.
In a preferred embodiment, said system further comprises:
- a user device comprising user input means, a processor, tactile non-volatile memory, program code present on said memory for giving instructions to said processor, and preferably a screen for displaying information to a user;
wherein said user device is intended to receive user input from the user via said user input means and send said user input to said computer device;
BE2018 / 5518 wherein said user device is preferably intended to display information received from said computer device to said user via said screen;
wherein said step (a) comprises retrieving data regarding a choice of said user with respect to said time series and / or said one or more indicator time series;
and wherein said step (f) preferably comprises transmitting said predicted output value to said user device for display on said screen.
In another preferred embodiment, the present invention is applied to improve market prediction, and the time series may relate to sales of a product or service over a recent time period. Here, the prediction is preferably aimed at predicting the demand for the product or service in the near future. The objective here is to include information in the form of indicator time series from external parties in the forecast.
In a related preferred embodiment, the plurality of indicator time series relates to macroeconomic indicators as well as weather information as well as information about the private market. The macroeconomic indicators can relate to EUROSTAT, Federal Reserve of Economic Data, National Bank of China Statistics, etc.
The invention will be further described by the following non-limitative examples which further illustrate the invention, and which are not intended, and should not be construed as a limitation of the scope of the invention.
Examples
Example 1: First exemplary embodiment of the method of the present invention
In this example, the present invention is applied to improve market prediction, and the time series may relate to sales of a product or service over a recent time period. The forecast is aimed at predicting demand in the near future
BE2018 / 5518 for the product or service. The objective here is to include information in the form of indicator time series from external parties in the forecast.
The plurality of indicator time series preferably relates to macroeconomic indicators as well as weather information as well as information about the private market. The macroeconomic indicators can relate to EUROSTAT, Federal Reserve of Economic Data, National Bank of China Statistics, etc.
In general, the exemplary embodiment is aimed at making advanced predictions about the demand for a product or service using external information:
- Macroeconomic information
- Weather information
- Information about the private market
Here, advanced statistical algorithms correspond to this external information with the user's input, i.e. the time series provided by the user.
The user can supply external events to the algorithm, which can quantify the impact of events in the sales data in an automated way, and can filter this effect from the sales data. This corresponds to an embodiment in which automated detection of outliers is provided, and wherein the automated detection comprises an identification of a time-related event and wherein said automated detection comprises an interaction with a user for determining whether said time-related event can be associated with a known event that is known to the user and / or an identification with the user for determining whether an effect of said time-related event on said time series can be compensated for by changing said time series.
In a next step, the sales data is dissolved into a trend effect, a seasonal effect, and "inexplicable" noise, as is common in the STL approach.
Advanced regression methods are then applied to find time-shifted macroeconomic indicators predicting the sales trend, according to steps (c) and (d), while overfitting the sales data in step (e) is counteracted.
BE2018 / 5518
In this way a prediction of the sales trend is made, after which the effects of events and regular seasonality are added in a later phase.
The standard output shows a forecast of demand based on external indicators.
The output indicates which external indicators are used in the forecasting model and to what extent they influence sales, and how the cycle of the external indicator relates to the sales of the user (input).
The standard output also shows a comparison in performance of typical prediction methods such as Holt-Winters, Arima, ETS.
Instead of being based solely on internal historical sales data, the invention therefore uses external analyzes to make a prediction based on relevant external indicators. This involves automatic selection with regard to which external factors are relevant for the product / service, so that insights are quantified in an objective manner. This is in contrast to the standard practice of qualitatively acquiring these insights, a practice that is subject to all kinds of tendencies (judgmental, optimism, anchoring, etc.). The invention not only provides an outlet for, for example, a user of a supply chain, but the insights gained (through better insight into the market) also ensure that other user groups benefit from the prediction, such as finance, strategic marketing, sales management, account management.
With regard to information provided by the user, the following information is preferably given:
- Sales data
- Event details
- Own external indicators / choice of external indicators
On the other hand, the system preferably has access to a range of further indicators.
Below is a more detailed exemplary sequence of steps regarding the processing performed by the system:
BE2018 / 5518
Information provided by the user (see above) is entered in the system.
Automated statistical outlier detection is performed. This is done on the basis of certain patterns (additive outlier, level shift outlier, and seasonal shifts). This is done in particular on the basis of ARIMA models that assess the statistical significance of these patterns through Student's t-test. This determines the significance of the test with regard to the explainable effects in the time series (eg trend, seasonality, auto-regression).
If 'suspicious' patterns arise from outlier detection, these patterns can be automatically compensated (the influence can be estimated by a statistical coefficient) and manually (the user can estimate the required compensation in a more correct way). This makes it possible to extract the influence of accountable effects from the time series. There can be no macroeconomic indicators for such events. Examples of these events are strikes, takeovers, the loss or acquisition of a large customer, bankruptcy of a competitor, depletion of the stock, legal / political decisions.
Subsequently, the remaining signal is decomposed into several components. These components are seasonal, trend and noise. This dissolution is carried out by means of an STL filter.
o Further analysis is performed on the trend component of this signal.
o These seasonal coefficients (which can be both additive and multiplicative) are stored. In a later phase this is added to the prediction made about the trend component.
o The noise component is considered non-significant. This assumption is consistent with the validation for statistical outliers in an earlier step.
The indicators (independent variables) are then dissolved using the STL filter, whereby at least the trend component is retained.
Transformations are preferably applied to the indicators (independent variables), e.g. by main component analysis.
BE2018 / 5518
- The indicators are then shifted over time in a tactical time window of for example 1 to 18 months.
- Advanced regression methods are then applied to find time-shifted macroeconomic indicators that predict the user's sales trend while avoiding over-fitting of sales figures by selecting only the most relevant indicators. Preferred methods of regression may, for example, be related to least squares models.
- A prediction of the trend for the future is generated based on the selected indicator time series and the time series of the sales figures. The prediction can possibly be expanded by making predictions of the indicators. This may relate to the latest observation, an 'expert' prediction, or a statistical prediction of this indicator.
- In the final step, the seasonality is reintroduced into the forecast, which yields the final forecast values.
Example 2: Second exemplary embodiment of the method of the present invention
In an additional second example, the multiple input values of the physical entity relate to quantitative measurements with respect to CO 2 concentration measurements, ie, measurements over time by means of physical sensors installed outside of one or more physical measurements for measuring the relative amount of CO 2 present in the air. The output value of the physical entity mentioned here refers to a predicted CO2 concentration measurement in the future, for example with regard to the following months, the following year or the next five years. The plurality of indicator input values of the physical entity relate to quantitative measurements with respect to local wind speed measurements or air pollution concentration measurements. Here the local wind speed and preferably also the wind direction is observed to explain variations in the short term. On the other hand, certain air pollutants can be highly correlated with the CO2 concentration, since they can come from related combustion processes. A prediction of CO 2 concentration based on both direct measurement and indirect measurement (via air polluters) can therefore be more accurate, while the measurement of wind speed can contribute to a more robust forecast in the short term.
BE2018 / 5518
Example 3: Example implementation of the system of the present invention
Figures 1-4 illustrate an exemplary implementation of the system according to the present invention.
Figure 1 shows an example of the architecture of the system according to the present invention. Figure 2 gives a detailed view of this example architecture.
Figure 3 shows an example of search architecture. This preferably relates to step (a) of the present invention, wherein the search consists of retrieving said time series and / or said one or more indicator time series.
Figure 4 shows an example of a data model with regard to the present invention.
Figure 5 shows an example of a flow diagram of the method according to the present invention.
Example 4: Numeric prediction example with nine indicator time series
Figures 6-9 show a first, second, third and fourth graph 60-90 with respect to numerical examples illustrating the present invention. In this example, each of the graphs shows a sales volume with respect to a certain group of products and services from a certain company as a function of time. The scale is linear in both dimensions. Time is over a period of years, with one entry value for each month. The sales volume can be expressed in a quantity related to price / turnover (in EURO), but it can also relate to a total number of products or services that have been sold. In one exemplary embodiment, it is assumed that the sales volume is given with respect to a number of products of the physical entity and that the sales volume relates to the product count of the physical entity. The range of sales volume refers to the values y 1 , y 2 and y 3 , where y 1 <y 2 <y 3 , and where y 2 - y 1 = y 3 - y 2 . Note that preferably y 1 > 0, and therefore that each of the graphs can be considered as zooming on a suitable zoom area with respect to the sales volume range. In an exemplary embodiment, y is 1 = 2,000,000, y 2 = 3,000,000 and y 3 = 4,000,000.
BE2018 / 5518
On the first graph 60, as shown in FIG. 6, a first curve 81 shows the time series to be predicted that relates to the said sales volume. This relates to step (a) of the invention. The curve comprises a series of input values, preferably input values of the physical entity, with one input value per month, interpolated because of the visual representation. Each entry value from this time series is based on actual measurements of the sales volume, and may or may not be based on automated measurements of an item count and / or weight determination. The observed period is from 2010 to 2017. The times can relate to a specific day of the month, for example the fifteenth day of the month, where the corresponding entry value is an aggregation value over the entire month.
Furthermore, a second curve 82, on the first graph 60, as shown in Figure 6, shows the trend component of this time series. This relates to a trend component that can preferably be obtained without taking into account any of the indicator time series, and which is furthermore a result without prediction. A method for obtaining this trend component can be STL. This relates to step (b) of the invention.
Prediction with regard to this time series is based on one or more indicator time series. For the selection of these one or more indicator time series, a total of about 400 indicators, preferably 393 indicators, have been selected from external databases based on a match of keywords between a query keyword and the keywords associated with each of the indicator time series. The keywords relate to different word characteristics of the company to which the time series relates. The indicator time series may also include an indicator input value for each month, or may be processed to match this requirement. This relates to step (a) of the invention.
For each of the 393 indicator time series, a dissolution is performed, e.g., by STL, to determine an indicator trend component. This relates to step (b) of the invention, and yields 393 indicator trend components.
For each of the 393 indicator trend components, a total of twelve time-shifted indicator trend components are generated, associated with layers of 0, 1, 2 ... 11 months. This relates to step (c) of the invention and
BE2018 / 5518 yields 393 * 4 = 1572 time-shifted indicator trend components in the total.
For each of the 1572 time-shifted trend components, a regression score is calculated with respect to the trend component of the time series. Here, the trend component serves as a dependent variable and the time-shifted indicator trend component serves as a predictor. This relates to step (d) of the invention, and yields 1572 regression scores.
Based on the regression scores, 1572 in number, a number of indicator trend components are selected for prediction. This relates to step (e) of the invention. In this example, the number of selected indicator trend components is 9. Since a main consideration is a powerful prediction associated with high regression scores, the selection preferably prefers indicator trend components with a high regression score. Another important consideration is the prevention of overfitting, whereby the number of selected indicator trend components is usually kept low. In one embodiment, the selection may relate to arranging the indicator trend components to lower the regression score, and merely adding the highest ranked indicator trend component to the selection as long as a certain criterion for the risk of overfitting is met. The selection can also take into account the relationship of the indicator trend component with respect to the trend component of the time series. A positive relationship may indicate that an increase in the indicator trend component may be indicative of an increase in the time series trend component. A negative relationship can, on the other hand, indicate that an increase in the indicator trend component can be indicative of a decrease in the time series trend component.
With 9 indicator trend components selected, each of them is used for the final prediction. This relates to step (f) of the invention. Note here that not only the trend component of the time series and the trend components of the indicator time series, but also the seasonal component of the time series are taken into account. In some embodiments, the seasonal components of one or more of the indicator time series may also be taken into account in the prediction.
Regarding the selection of the indicator trend components as described above, the 9 indicators showed the following associated lag (in
BE2018 / 5518 months), relationship (positive or negative) and variance (in percent) as shown in the following table.
Indicator Lag Relationship variance 1 4 - 14.46 2 5 - 15.56 3 1 + 17.17 4 6 - 13.52 5 7 - 1.81 6 2 + 0.79 7 1 + 13.66 8 2 - 15.45 9 3 - 7.57
Note here that the instance indicator 4 and indicator 5 relate to the same indicator time series, but with two different layers, or, equivalent, two different time shifts. The same applies to indicators 6 and 7, and also to indicators 8 and 9. In all three cases, the two time-shifted indicator trend components relate to the same indicator trend component, but are nevertheless different due to a difference in the time shift applied in step (b).
This indicator time series / trend components may relate, for example, to a raw material volume availability indicator and / or a weather-related indicator based on, for example, temperature measurement and any plurality of indicator time series comprising at least a plurality of indicator time series of the physical entity including indicator input values of the physical entity. In addition, in one embodiment where the company relates to keywords such as steel, mines, carbon, oil, iron, automobile, construction, ships / boats and infrastructure, the following indicator time series may be (further) relevant:
Exchange rate from Euro to national currency for the United States,
BE2018 / 5518
- Relative importance weight (contribution to the total industrial production index):
Primary metal: Equipment steel
- Exchange rate from Euro to national currency for the United States
Other indicator time series that may be further relevant include:
- Relative importance weight (contribution to the total industrial production index):
Primary metal: Crude steel
- Relative importance weight (contribution to the total industrial production index):
Transport equipment: Construction of ships and boats.
The second graph 70, as shown in Figure 7, includes the same two curves 81 and 85, as shown in Figure 6, for almost the same time range. However, this graph 70 includes a third curve 86, which shows the prediction that can be generated based on the 9 indicator time series. The curve 86 relates in particular to an output generated with knowledge of the 9 indicator time series, but without direct knowledge of the time series. Consequently, the curve 86 is indicative of the accuracy in approximating the time series within the available time window, which served entirely as a training series. Here, the curve 86 can relate to either a trend component or a combination of a trend component output and a seasonal component output. The great similarity between curve 86 and curve 85 indicates the strong prediction potential of the selected set of indicator time series.
The third graph 80, as shown in Figure 8, again includes the first curve 81 which is the time series. Now a time limit 84 is set at the beginning of 2015, as shown as a vertical line. The graph 80 compares prediction based on the time series to time limit 84, according to the two models. The first model is the traditional ETS model, with corresponding curve 83. The second model is the prediction according to the present invention, with corresponding curve 82. This curve is calculated taking into account at least the trend components of all nine selected indicator time series as well as the trend - and seasonal component of the time series. As can be seen, the curve 82 follows the actual time series curve 81 more closely, with remarkably better performance than the ETS model directly at the start of the prediction period, ie shortly after the stated time limit 84. It should be noted here that the prediction of the the first predicted (monthly) value is entirely based on extrapolation, ie only data prior to the time limit is charged. For later (monthly) values,
BE2018 / 5518 values for at least a part of the indicator time series are also available outside the time limit. This can be apparent from the above table, with only a 1 month lag for indicators 3 and 7. With such a small lag, new indicator time series data can be quickly taken into account as they become available, showing another advantage of the present invention. This contrasts with the ETS approach, which cannot take into account possible updates of indicator time series, since it is not based on that.
The fourth graph 90, as shown in Figure 9, includes the original time series curve 81. Note here that this curve 81 does not extend to the end of the time scale, but instead to (and including) November 2017. From December 2016, prediction made by the ETS model, according to curve 82 ', and according to the present invention, according to curve 83'. For the ETS model, the curve only starts where the time series ends. For the curve 83 'according to the present invention, values within the test series are also calculated. This curve is calculated taking into account at least the trend components of all nine selected indicator time series as well as the trend and seasonal component of the time series. Although the fourth graph 90 does not indicate which prediction is best, the following table shows the results obtained in detail.
Date ETS forecast Actual figures Dec "16 1 752 834 1 802 078 2 412 091 Jan '17 2 833 711 2 549 856 2 733 197 Feb "17 2 660 411 2 362 243 2 711 745 Mar "17 2 815 350 2 533 429 3 207 108 Apr17 2 582 948 2 381 840 2 880 130
In terms of relative error, this yields the following.
Date Error rate invention Error rate ETS Dec "16 27.30% 25.30% Jan '17 3.70% 6.70% Feb "17 1.90% 12.90%
BE2018 / 5518
Mar '17
12.20%
21.00%
Apr'17
10.30%
17.30%
Average
11.10%
16.60%
The aforementioned average refers to the average over the said 5-month period. This means an average reduction in prediction error by 33.3%, or, equivalent, an improvement in accuracy by 33.3%.
Example 5: Numerical prediction example with regard to CO 2 concentration
Figures 6-9 show a first, second, third and fourth graph 60-90 with respect to numerical examples illustrating the present invention. In this example, each of the graphs shows a CO 2 concentration measurement as a function of time, including a prediction of CO 2 concentration measurements on Figures 8 and 9. The CO 2 concentration measurement refers to a physical measurement of a sensor located on a specific single location. The exact location of the sensor clearly has a major impact on the measurements. For example, the sensor can be very close or on an industrial site with production that is subject to large seasonal variations, as well as long-term trends. In such an environment, CO 2 concentration may be modulated by seasonal variations and production trends. In addition, the sensor can be very close or in a green environment such as a forest, which will clearly have an impact on the CO 2 concentration. More specifically, weather conditions may allow vegetation to absorb more or less of the CO2 present in the air. Moreover, the sensor can be placed in open air, but it can also be placed in a partially closed room with little ventilation, which can lead to some lag in the measurements.
Just like in example 4, time is a period of years, with one entry value for each month. The difference with example 4 lies in the values that have been determined, which now relate to CO2 concentration measurements. The range of CO 2 concentration measurements considered relates to the values y 1 , y 2 and y 3 , where y 1 <y 2 <y 3 , and where y 2 - y 1 = y 3 - y 2 . Just as in Example 4, preferably y is 1 > 0, and therefore each of the graphs can be considered as zooming on a suitable zoom area with respect to the CO2 concentration range. In an exemplary embodiment, y1 = 200, y2 = 300 and y3 = 400, all expressed in parts per million (ppm).
BE2018 / 5518
Just as in Example 4, the first curve 81, on the first graph 60, as shown in Figure 6, shows the time series to be predicted, which here, however, relates to the CO 2 concentration measurements of the sensor instead of the said ones sales volumes. This relates to step (a) of the invention. Note that this measurement curve is purely illustrative and preferably relates to local measurements. In no case does it relate to the evolution of the global average CO2 concentration, which is outside the scope of the present invention.
Prediction with regard to this time series is based on one or more indicator time series. For the selection of these one or more indicator time series, a large number of potential indicators can again be considered. This relates to step (a) of the invention. In this example, the plurality of indicator input values of the physical entity relate to quantitative measurements with respect to a second physical parameter and preferably third physical parameter such as local wind speed measurements or air pollution concentration measurements, preferably measured at different locations near said sensor location. Such measurements may be good predictors of the CO2 concentration measured by the sensor.
The remainder of this example is analogous to Example 4. For each of the indicator time series, a dissolution is performed, e.g., by STL, to determine an indicator trend component. This relates to step (b) of the invention. For each of the indicator trend components, a total of twelve time-shifted indicator trend components are generated, associated with layers of 0, 1, 2 ... 11 months. This relates to step (c) of the invention. For each of the time-shifted trend components, a regression score is calculated with respect to the trend component of the time series. This relates to step (d) of the invention. Based on the regression scores, a (preferably small) number of indicator trend components are selected for prediction. This relates to step (e) of the invention. With the indicator trend components selected, each of them is used for the final prediction. This relates to step (f) of the invention. Here details and numerical values of each step can be obtained from example 4. In view of the chosen values for y 1 , y 2 and y 3 , compared to example 4, this means a scaling factor of 10,000, ie any amount indicated
BE2018 / 5518 in Example 4 can be divided by 10,000 to obtain the amounts illustrating Example 5.
It is believed that the present invention is not limited to any embodiment described above and that certain changes may be added to the present examples without departing from the appended claims. For example, the present invention has been described with reference to examples of input values of a physical entity with respect to a measurement of a physical parameter, but it is clear that the invention can be applied to any input values of a physical entity and some indicator input values of a physical entity entity.
权利要求:
Claims (17)
[1]
Conclusions
A computer-implemented method for predicting a time series based on one or more indicator time series;
wherein said time series relates to a observed period, wherein the time series comprises a plurality of input values from a physical entity associated with a plurality of times occurring during said observed period;
wherein said one or more indicator time series each relates to an associated indicator period, each indicator period at least partially overlapping with said observed period, wherein each indicator time series comprises a plurality of indicator input values of the physical entity associated with a plurality of indicator times that occur during the aforementioned associated indicator period;
the method comprising the following steps:
(a) receiving said time series and said one or more indicator time series;
(b) resolving said time series into at least one trend component and a seasonal component, and resolving each of said one or more indicator time series into at least one indicator trend component;
(c) time-shifting each indicator trend component from its associated indicator time period to said observed period;
(d) calculating a regression score with respect to said trend component for each of said time-shifted indicator trend components, the trend component serving as a dependent variable and the time-shifted indicator trend component serving as a predictor;
(e) based on the regression scores of each of said time-shifted indicator trend components, selecting one or more of said time-shifted indicator trend components for prediction, the said
BE2018 / 5518 selection is aimed at at least preventing overfitting with respect to said time series;
(f) predicting an output value of a physical entity with respect to said time series, wherein said output value relates to a time moment occurring later than said observed period; wherein said prediction is based on the trend component and the seasonal component of the time series as well as on said time-shifted indicator trend components selected in step (e).
[2]
A method according to claim 1, wherein said one or more indicator time series relate to two or more indicator time series, and wherein preferably said selecting in step (e) comprises omitting at least one of said time-shifted indicator trend components from the selection mentioned.
[3]
Method according to claims 1-2, wherein said selecting in step (e) comprises comparing said regression scores and / or a normalized form of said regression scores with a predetermined threshold value and / or wherein said selecting is further based on a number of already selected time-shifted indicator trend components when determining whether an additional time-shifted indicator trend component can be selected, wherein preferably a tendency for selecting an additional time-shifted indicator trend component decreases monotonously with said number of already selected in time-shifted indicator trend components.
[4]
The method of any one of claims 1-3, wherein said step (b) further comprises resolving each of said indicator time series into an indicator seasonal component, said prediction in step (f) being further based on this indicator seasonal components associated with time-shifted indicator trend components selected in step (e), preferably wherein said prediction in step (f) is further based on time-shifted indicator seasonal components that are time-shifted from the associated indicator time period to the
BE2018 / 5518 and associated with these time-shifted indicator trend components selected in step (e).
[5]
The method of claim 4, wherein said step (f) comprises predicting an indicator output value of a physical entity for at least one of said time-shifted indicator trend components selected in step (e) based on at least said one the time-shifted indicator trend component and its associated seasonal component component and / or time-shifted seasonal component indicator, and wherein said prediction of said output value in step (f) is further based on said predicted indicator output value.
[6]
The method of any one of claims 1-5, wherein said time shift of said each indicator trend component in step (c) is performed from its associated time period to a predetermined tactical period different from said observed period, and wherein the said calculation of regression scores in step (d) and / or said prediction in step (f) are preferably made in view of said tactical period instead of said observed period, said tactical period preferably not being smaller than 1 month and preferably no more than 18 months.
[7]
The method of any one of claims 1-6, wherein said prediction in step (f) comprises the following sub-steps
- calculating an output trend value of the physical entity with respect to said trend component, wherein said exit trend value relates to a time moment occurring later than said observed period; wherein said calculation is based on the trend component as well as on said time-shifted indicator trend components selected in step (e);
- calculating the output value of the physical entity based on said output trend value of the physical entity and said seasonal component.
BE2018 / 5518
[8]
A method according to any one of claims 1-7, wherein said dissolution in step (b) is preceded by an automated detection of one or more outliers, said one or more outliers being preferably related to additive outliers and / or level shift outliers and / or seasonal shift outliers.
[9]
The method of claim 8, wherein said automated detection is based on an auto-regressed integrated motion average (ARIMA) model and / or wherein said automated detection comprises applying Student's t test, preferably wherein said automated detection is based on a combination of the aforementioned autoregressively integrated motion average (ARIMA) model and the aforementioned Student's t-test.
[10]
The method of claims 8-9, wherein said automated detection comprises an identification of a time-related event, and wherein said automated detection comprises an interaction with a user for determining whether said time-related event can be associated with a known event that is known to the user and / or identification with the user for determining whether an effect of said time-related event on said time series can be compensated for by changing said time series.
[11]
The method according to claims 1-10, wherein said dissolution of said time series in step (b) and / or said dissolution of each of said one or more indicator time series is based on Seasonal and Trend decomposition using Loess (STL) and / or by a regression method, preferably with respect to a least squares model.
[12]
A method according to claims 1-11, wherein said dissolution of said one or more indicator time series in step (b) is followed by an independent variable analysis, preferably main component analysis (PCA), applied to each of the indicator trend components determined in step ( b) and preferably further applied to each of the indicator seasonal trend components determined in step (b), and wherein an output of said independent variable analysis is used for said time shift in step (c).
BE2018 / 5518
[13]
The method of claims 1-12, wherein said plurality of physical entity input values relate to quantitative measurements with respect to a physical parameter such as CO 2 concentration measurements; wherein said output value of the physical entity is related to a predicted measurement with respect to said physical parameter such as a predicted CO2 concentration measurement; and wherein said plurality of physical entity's indicator input values are related to quantitative measurements with respect to a second physical parameter and preferably third physical parameter such as local wind speed measurements or air pollution concentration measurements.
[14]
The method of claims 1-13, wherein at least two of said time-shifted indicator trend components selected in step (e) relate to the same indicator trend component, yet are different due to a difference in the time shift applied in step (b).
[15]
A prediction system, wherein said system comprises
- a communication module that has access to:
o a database comprising a said time series that relates to an observed period, the time series comprising a plurality of input values from a physical entity associated with a plurality of times occurring during said observed period;
o one or more indicator databases, preferably two or more indicator databases, comprising one or more indicator time series, preferably two or more indicator time series, with respect to an associated indicator period, wherein each indicator period at least partially overlaps with said observed period, each indicator time series being an a plurality of physical entity indicator input values associated with a plurality of indicator times occurring during said associated indicator period;
BE2018 / 5518
- a computer device comprising a processor, tactile non-volatile memory, program code present on said memory for giving instructions to said processor;
wherein the communication module is adapted to provide access to said database and said one or more indicator databases to said computer device;
wherein said computer device is configured to perform a method of prediction with respect to said time series based on said one or more indicator time series, preferably upon request by said user having at least one of said time series and said one or more specify multiple indicator time series via said user input means; said method comprising the following steps:
(a) retrieving said time series from said database and retrieving said one or more indicator time series from said one or more indicator databases;
(b) resolving said time series into at least one trend component and a seasonal component, and resolving each of said one or more indicator time series into at least one indicator trend component;
(c) time-shifting each indicator trend component from said associated indicator time period to said observed period;
(d) calculating a regression score with respect to said trend component for each of said time-shifted indicator trend components, the trend component serving as a dependent variable and the time-shifted indicator trend component serving as a predictor;
(e) based on the regression scores of each of said time-shifted indicator trend components, selecting one or more of said time-shifted indicator trend components for prediction, the said
BE2018 / 5518 selection is aimed at at least preventing overfitting with respect to said time series;
(f) predicting an output value of a physical entity with respect to said time series, wherein said output value relates to a time moment occurring later than said observed period; wherein said prediction is based on the trend component and the seasonal component of the time series as well as on said time-shifted indicator trend components selected in step (e);
[16]
The system of claim 15, wherein said system further comprises:
- a user device comprising user input means, a processor, tactile non-volatile memory, program code present on said memory for giving instructions to said processor, and preferably a screen for displaying information to a user;
wherein said user device is adapted to receive user input from the user via said user input means and send said user input to said computer device;
said user device is preferably adapted to display information received from said computer device to said user via said screen;
wherein said step (a) comprises retrieving data regarding a choice of said user with respect to said time series and / or said one or more indicator time series;
and wherein said step (f) preferably comprises transmitting said predicted output value to said user device for display on said screen.
[17]
Use of a method according to claims 1-14 in a system according to claims 15-16.
类似技术:
公开号 | 公开日 | 专利标题
Detotto et al.2010|Does crime affect economic growth?
Kobayashi et al.2012|A statistical deterioration forecasting method using hidden Markov model for infrastructure management
Berg et al.2016|Telling a similar story twice? NCVS/UCR convergence in serious violent crime rates in rural, suburban, and urban places |
Banker et al.2008|Managerial optimism, prior period sales changes, and sticky cost behavior
Alfieri et al.2017|Dynamic inventory rationing: How to allocate stock according to managerial priorities. An empirical study
Mariani et al.2019|Embedding eWOM into efficiency DEA modelling: An application to the hospitality sector
Alós-Ferrer et al.2021|Time will tell: Recovering preferences when choices are noisy
Clements et al.2017|Predicting early data revisions to US GDP and the effects of releases on equity markets
Liebermann2014|Real‐Time Nowcasting of GDP: A Factor Model vs. Professional Forecasters
Pereira et al.2014|A metamodel for estimating error bounds in real-time traffic prediction systems
Seebach et al.2011|Tracking the digital footprints of customers: How firms can improve their sensing abilities to achieve business agility
Brint et al.2021|Reducing data requirements when selecting key performance indicators for supply chain management: The case of a multinational automotive component manufacturer
Murray et al.2018|ASACT-Data preparation for forecasting: A method to substitute transaction data for unavailable product consumption data
BE1026185B1|2019-11-04|Improved forecasting method
Gupta et al.2007|Audit selection strategy for improving tax compliance: application of data mining techniques
Escudero et al.2018|Risk terrain modeling for monitoring illicit drugs markets across Bogota, Colombia
Hua et al.2008|Improving density forecast by modeling asymmetric features: An application to S&P500 returns
Leiria et al.2021|Non-life insurance cancellation: a systematic quantitative literature review
Orazbayev2017|Sequential order as an extraneous factor in editorial decision
CN112686448A|2021-04-20|Loss early warning method and system based on attribute data
Zhao2020|Predicting US business cycle turning points using real-time diffusion indexes based on a large data set
Li et al.2020|How salience of management guidance affects forecasting behavior: evidence from a quasi-natural experiment on estimize
Yin et al.2016|A discrete constraint-based method for pipeline build-up aware services sales forecasting
Kehelwalatenna et al.2014|Intellectual capital performance and its long-run behavior: The US banking industry case
TW201506827A|2015-02-16|System and method for deriving material change attributes from curated and analyzed data signals over time to predict future changes in conventional predictors
同族专利:
公开号 | 公开日
BE1026185A1|2019-10-29|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
EP1207475A2|2000-09-21|2002-05-22|Ricoh Company, Ltd.|System and method for providing environmental impact information, recording medium recording the information, and computer data signal|
法律状态:
2019-11-27| FG| Patent granted|Effective date: 20191104 |
优先权:
申请号 | 申请日 | 专利标题
BE201805234|2018-04-05|
BE2018/5234|2018-04-05|
[返回顶部]